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Abstract 

We  propose  using  deep  learning  as  the  “workhorse”  of  a  cog¬ 
nitive  architecture.  We  show  how  deep  learning  can  be  lever¬ 
aged  to  learn  representations,  such  as  a  hierarchy  of  ana¬ 
logical  schemas,  from  relational  data.  Our  view  drives  some 
desiderata  of  deep  learning,  particularly  modality  indepen¬ 
dence  and  the  ability  to  make  top-down  predictions.  Finally, 
we  consider  the  problem  of  how  relational  representations 
might  be  learned  from  sensor  data  that  is  not  explicitly  re¬ 
lational. 

Deep  Learning  as  a  Workhorse  for  Learning 
and  Inference 

We  consider  the  hypothesis,  suggested  by  neuroanatomy 
(Mountcastle  1978),  that  higher  level  cognition  is  built  on 
the  same  fundamental  building  blocks  as  low-level  percep¬ 
tion.  Likewise,  we  propose  that  learning  high-level  represen¬ 
tations  uses  many  of  the  same  mechanisms  as  learning  per¬ 
ceptual  features  from  low-level  sensors,  which  is  essentially 
what  deep  learning  systems  do. 

In  our  work,  we  assume  that  such  a  system  — a  system 
that  not  only  learns  a  feature  hierarchy  from  a  collection 
of  fixed-width  vectors,  but  also  uses  the  feature  hierarchy 
to  parse  new  vectors  and  make  predictions  about  missing 
values —  can  be  used  as  the  workhorse  for  learning  and  rea¬ 
soning.  We  assume  that  such  a  system  is  modality  indepen¬ 
dent  and  learns  a  feature  hierarchy  with  relevant  invariances 
for  whatever  modality  it  is  trained  on,  given  enough  training 
data.  For  example,  given  a  large  number  of  images,  the  sys¬ 
tem  should  learn  features  such  as  visual  objects  with  invari¬ 
ance  to  rotation,  translation,  and  scale.  A  copy  of  the  same 
initial  (untrained)  system,  given  ample  speech  data,  should 
learn  phonemes  and  words  with  invariance  to  pitch,  speed, 
and  speaker.  Some  evidence  suggests  that  the  perceptual  cor¬ 
tex  is  capable  of  such  plasticity  (Sur  and  Rubenstein  2005). 
There  are  already  deep  learning  systems  that  accomplish 
part  of  this  goal  (Le  et  al.  2012),  (LeCun  2012),  but  these 
provide  the  architecture  and  connectivity,  which  implicitly 
relies  on  knowledge  of  the  topology  of  the  sensor  modalities 
on  which  these  systems  are  trained.  Ideally,  we  would  like 
this  network  structure  to  be  learned  because,  for  higher-level 
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representations,  such  as  that  described  in  the  next  section, 
the  topology  is  unknown  beforehand  and  must  be  learned. 

Though  there  is  still  work  to  be  done  by  the  deep  learning 
community  before  such  a  system  is  completely  developed, 
we  consider  how  this  system  might  be  leveraged  to  learn 
and  use  higher  level  representations. 

Leveraging  Deep  Learning  for  Relational  Data 
and  Logical  Inference 

A  criticism  of  deep  learning,  and  connectionism  in  general, 
is  that  such  systems  are  incapable  of  representing  (much  less 
learning)  relational  schemas  such  as  “sibling”.  Furthermore, 
deep  learning  has  been  criticized  for  being  unable  to  make 
simple  parameterized  logical  inferences  such  as  “If  A  loves 
B  and  B  loves  C,  then  A  is  jealous  of  C.”  (Marcus  1998). 
We  have  taken  steps  to  address  these  criticisms  by  show¬ 
ing  how  a  second  (non-connectionist)  system  can  transform 
relational  data  into  fixed-width  vectors  such  that  overlap 
among  these  vectors  corresponds  to  structural  similarity  in 
the  relational  data.  Unlike  related  approaches  ((Socher  et  al. 
2012),  (Rachkovskij,  Kussul,  and  Baidyk  2012),  (Levy  and 
Gayler  2008)),  our  representation  is  able  to  exploit  partial 
analogical  schemas.  That  is,  a  partial  overlap  in  our  repre¬ 
sentation’s  vectors  corresponds  to  a  common  subgraph  in  the 
corresponding  structures.  Furthermore,  through  processes  of 
windowing  and  aliasing  our  system  is  able  to  represent  struc¬ 
tures  with  hundreds  of  entities  and  relations  using  a  few 
thousand  features,  whereas  the  earlier  work  requires  thou¬ 
sands  of  features  to  represent  structures  with  only  a  handful 
of  entities  and  relations.  The  details  of  our  transformer  and 
the  examples  below  are  given  in  (Pickett  and  Aha  2013). 

With  this  transformer,  we  can  feed  transformed  structures 
into  a  simple  deep  learning  system  to  learn  features  that  are 
relevant  for  these  structures.  These  learned  features  corre¬ 
spond  to  analogical  schemas.  For  example,  given  126  stories 
in  predicate  form  (Thagard  et  al.  1990),  our  system  produces 
a  feature  hierarchy  of  stories  (corresponding  to  plot  devices), 
part  of  which  is  shown  in  Figure  1 .  In  this  figure  we  see  a 
“Double  Suicide”  analogical  schema  found  in  both  Romeo 
and  Juliet  and  in  Julius  Caesar.  In  the  former,  Romeo  thinks 
that  Juliet  is  dead,  which  causes  him  to  kill  himself.  Juliet, 
who  is  actually  alive,  finds  that  Romeo  has  died,  which 
causes  her  to  kill  herself.  Likewise,  in  Juliet  Caesar,  Cassius 
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kills  himself  after  hearing  of  Titinius’  death.  Titinius,  who  is 
actually  alive,  sees  Cassius’s  corpse,  and  kills  himself.  The 
largest  schema  found  (in  terms  of  number  of  outgoing  edges) 
was  that  shared  by  Romeo  and  Juliet  and  West  Side  Story, 
which  are  both  stories  about  lovers  from  rival  groups.  The 
latter  doesn’t  inherit  from  the  Double  Suicide  schema  be¬ 
cause  the  analog  of  Juliet/Titinius,  Maria,  doesn’t  die  in  the 
story,  and,  Tony  (the  analog  of  Romeo/Cassius)  meets  his 
death  by  murder,  not  suicide.  Some  of  the  schemas  found 
were  quite  general.  For  example,  the  oval  on  the  lower  right 
with  6  incoming  edges  and  3  outgoing  edges  corresponds  to 
the  schema  of  “a  single  event  has  two  significant  effects”. 
And  the  oval  above  the  Double  Suicide  oval  corresponds  to 
the  schema  of  “killing  to  revenge  of  another  killing”. 


Figure  1:  Part  of  the  Feature  Hierarchy  our  system 
learned  from  a  story  dataset.  Grey  boxes  on  the  left  corre¬ 
spond  to  instances  (individual  stories).  The  black  ovals  rep¬ 
resent  higher  level  concepts.  The  “raw”  features  are  omit¬ 
ted  due  to  space  limitations.  Instead,  we  show  the  outgoing 
edges  from  each  black  oval.  The  high  level  concepts  corre¬ 
spond  to  shared  structural  features,  or  analogical  schemas. 
For  example,  the  highlighted  oval  on  the  right  represents  a 
Double  Suicide  schema,  which  happens  in  both  Romeo  and 
Juliet  and  in  Julius  Caesar. 

Once  the  relational  structures  are  transformed,  the  process 
of  retrieving  analogs  is  exactly  the  same  algorithm  as  that 
for  recognizing  visual  objects  given  a  visual  feature  hierar¬ 
chy,  namely  parsing  a  fixed-width  vector  into  its  component 


features.  By  this  process,  we  are  able  to  efficiently  retrieve 
analogs  in  logarithmic  time  (in  the  number  of  total  stories) 
compared  to  linear  time  for  the  MAC/FAC  algorithm  (For- 
bus,  Gentner,  and  Law  1995).  Table  1  shows  an  empirical 
comparison  of  analog  retrieval  on  the  story  dataset  of  our 
system  and  MAC/FAC,  where  our  system  yields  an  order- 
of-magnitude  speedup  (in  terms  of  vector  comparisons)  at  a 
small  loss  in  accuracy.  For  further  details,  please  see  (Pickett 
and  Aha  2013). 


MAC/FAC 

Accuracy 
100.00%  ±  .00% 

Avg.  #  Comparisons 
100.00  ±  .00 

Pickett  &  Aha 

95.45%  ±  .62% 

15.43  ±  .20 

Table  1 :  Speed/ Accuracy  Comparison 


Parsing  and  top-down  prediction  may  be  used  together 
with  a  non-connectionist  chaining  algorithm  to  perform 
rudimentary  logical  inference.  Briefly,  the  chaining  algo¬ 
rithm  chains  bindings  where  a  binding  is  a  symmetrical  re¬ 
lation  stating  that  two  variables  have  the  same  value.  If  A  is 
bound  to  B ,  and  B  is  bound  to  C,  then  chaining  infers  that  A 
is  bound  to  C.  A  simplified  example  of  inference  using  pars¬ 
ing,  top-down  prediction,  and  chaining  is  shown  in  Figure  2. 
In  this  example,  our  system  has  learned  analogical  schemas 
from  stories  of  theft,  diplomatic  visits,  and  defaulted  loans. 
In  The  Story  of  Doug,  the  system  is  told  that  Doug  loaned  a 
spatula  to  Gary  who  then  defaulted.  Our  system  parses  this 
story,  uses  top-down  prediction,  and  chaining  to  infer  that 
the  spatula  was  lost.  This  example  is  simplified  in  that  it 
does  not  use  windowing  or  alias,  and  the  variables  are  atoms 
rather  than  a  sparse  coding,  but  it  shows  the  basic  mecha¬ 
nism  of  inference. 

Whence  come  Relations,  Causality,  &  Entities? 

In  the  previous  section,  our  system  was  presented  with  sto¬ 
ries  already  encoded  in  predicate  form.  An  open  question 
is  how  stories  and  other  relational  structures  can  be  learned 
from  data  that  is  not  explicitly  relational.  For  example,  given 
a  large  number  of  videos  of  people  interacting,  how  might  a 
system  learn  entities  such  as  people  and  relations  such  as 
“loves”?  A  simpler  example  would  be,  given  a  large  number 
of  static  images  of  “billiard  ball  traces”,  such  as  that  shown 
in  Figure  3,  how  might  a  system  develop  entities  such  as 
“billiard  ball”  and  “mass”  (of  a  billiard  ball)  and  relations 
such  as  “bounces  off  of”?  We  believe  that  this  is  possible 
in  principle  because  a  naive  model  of  “billiard  physics”  can 
be  used  to  compress  such  images.  Note  that  our  question 
differs  from  the  questions  addressed  by  earlier  work  on  rela¬ 
tional  learning  ((Kemp  and  Tenenbaum  2008),  (Schmidt  and 
Lipson  2009))  in  that  neither  the  entities  nor  the  relations  are 
provided  to  our  system:  In  the  billiard  example,  the  primitive 
features  correspond  to  pixels,  and  features  such  as  mass  are 
not  directly  observable. 

Currently,  we  are  attempting  to  address  this  question.  Our 
current  approach  lies  in  investigating  how  a  model  of  bil¬ 
liard  physics  (and  other  systems)  can  be  represented  in  our 
framework  (note  that  natural  numbers  are  not  innate  in  our 


Figure  2:  Basic  inference  us¬ 
ing  bottom-up  parsing,  top-down 
prediction,  and  chaining  In  this 
simplified  example,  we  use  a  hi¬ 
erarchy  of  schemas  (learned  from 
stories  shown  on  the  lower  left)  to 
parse  The  Story  of  Doug,  which  is 
parsed  to  inherit  from  the  concept 
at  the  top-right.  This  concept  has 
the  atomic  feature  “loaned-lost”, 
which,  through  top-down  implica¬ 
tion,  we  infer  to  be  part  of  The 
Story  of  Doug.  We  then  use  a  non- 
connectionist  system  to  interpret 
the  features  in  the  Story  of  Doug 
as  bindings,  and  chain  “loaned- 
lost”  with  “loaned-Spatula”  to  infer 
“lost-Spatula”  (i.e.,  the  Spatula  was 
lost). 


Figure  3:  A  “Billiard 
Ball”  Trace.  How 

might  a  naive  model 
of  billiard  physics 
be  learned  from 
many  similar  static 
images? 


framework),  investigating  how  multi-step  inference  might  be 
performed,  developing  an  energy  function  (likely  a  combi¬ 
nation  of  compression  and  speed  of  inference  (Schmidhuber 
2002)),  and  investigating  how  representations  may  be  effi¬ 
ciently  searched  to  minimize  this  energy  function. 
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